Using Document-Quality Measures to Predict Web-Search Effectiveness
نویسندگان
چکیده
The query-performance prediction task is estimating retrieval effectiveness in the absence of relevance judgments. The task becomes highly challenging over the Web due to, among other reasons, the effect of low quality (e.g., spam) documents on retrieval performance. To address this challenge, we present a novel prediction approach that utilizes queryindependent document-quality measures. While using these measures was shown to improve Web-retrieval effectiveness, this is the first study demonstrating the clear merits of using them for query-performance prediction. Evaluation performed with large scale Web collections shows that our methods post prediction quality that often surpasses that of state-of-the-art predictors, including those devised specifically for Web
منابع مشابه
An Ensemble Click Model for Web Document Ranking
Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...
متن کاملComparison of Different Distance Measures on Hierarchical Document Clustering in 2-Pass Retrieval
Hierarchic document clustering has been applied to search results (query-specific clustering ) on the grounds of its potential improved effectiveness compared both to that of static clustering and of conventional inverted file search (IFS). In this paper we review and compare the effects of seven different measures of similarity among documents in hierarchic query specific clustering. We have c...
متن کاملDocument ranking using web evidence
Evidence based on web graph structure is reportedly used by the current generation of World-Wide Web (WWW) search engines to identify “high-quality”, “important” pages and to reject “spam” content. However, despite the apparent wide use of this evidence its application in web-based document retrieval is controversial. Confusion exists as to how to incorporate web evidence in document ranking, a...
متن کاملRRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features
Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...
متن کاملHierarchical Fuzzy Clustering Semantics (HFCS) in Web Document for Discovering Latent Semantics
This paper discusses about the future of the World Wide Web development, called Semantic Web. Undoubtedly, Web service is one of the most important services on the Internet, which has had the greatest impact on the generalization of the Internet in human societies. Internet penetration has been an effective factor in growth of the volume of information on the Web. The massive growth of informat...
متن کامل